Learning Objectives

This section covers the following topics:

  • Creation of sf::LINESTRING and sf::MULTIPOLYGON objects from raw data.

  • Loading map polygons from get_data()

  • Importing data from ESRI shapefile.

  • Implementing Spatial Joins.

Data Sources: De novo Creation

It is possible to make data from scratch by entering coordinates directly.

df <- data.frame( id = c(rep("Rodney",5), rep("Sarah",5)),
                  Data = rnorm( 10, 42, 42),
                  Longitude = rnorm(10, -78, 1 ),
                  Latitude = rnorm(10, 37, 1) )
df
       id       Data Longitude Latitude
1  Rodney  22.937889 -79.41838 36.09853
2  Rodney   7.707515 -78.13063 35.88361
3  Rodney  51.701901 -76.59887 36.81712
4  Rodney  27.875812 -78.48248 36.35113
5  Rodney -15.005974 -78.88329 35.67284
6   Sarah  -3.620029 -78.90505 38.85388
7   Sarah  11.084854 -78.68379 36.68990
8   Sarah  45.467802 -79.02128 38.24296
9   Sarah -11.474493 -77.57469 37.70191
10  Sarah  11.441565 -75.74775 38.57646

Data Sources: De novo Creation

To Make to sf as POINT objects as normal.

df %>%
  st_as_sf( coords = c("Longitude", "Latitude"), crs=4326 ) -> pts
pts
Simple feature collection with 10 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -79.41838 ymin: 35.67284 xmax: -75.74775 ymax: 38.85388
Geodetic CRS:  WGS 84
       id       Data                   geometry
1  Rodney  22.937889 POINT (-79.41838 36.09853)
2  Rodney   7.707515 POINT (-78.13063 35.88361)
3  Rodney  51.701901 POINT (-76.59887 36.81712)
4  Rodney  27.875812 POINT (-78.48248 36.35113)
5  Rodney -15.005974 POINT (-78.88329 35.67284)
6   Sarah  -3.620029 POINT (-78.90505 38.85388)
7   Sarah  11.084854  POINT (-78.68379 36.6899)
8   Sarah  45.467802 POINT (-79.02128 38.24296)
9   Sarah -11.474493 POINT (-77.57469 37.70191)
10  Sarah  11.441565 POINT (-75.74775 38.57646)

Plotting Points

We can easily mix individual aesthetic properties to change the representaion of the points provided by using geom_sf().

 

ggplot( pts ) + 
  geom_sf( aes(color = id, size=Data)) + 
  coord_sf() + 
  theme(axis.text.x = element_text(angle = 45, 
                                   vjust = 0.5, 
                                   hjust=1))

 

 

Data Sources: De novo Creation

To make sf as LINESTRING we can group by the id value and then we need to summarize the Data component, then cast it to a Linestring.

df %>%
  st_as_sf( coords = c("Longitude", "Latitude"), crs=4326 ) %>%
  group_by( id ) %>%
  summarize( Data = mean( Data ) ) %>%
  st_cast("LINESTRING") -> lines 

 

lines
Simple feature collection with 2 features and 2 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -79.41838 ymin: 35.67284 xmax: -75.74775 ymax: 38.85388
Geodetic CRS:  WGS 84
# A tibble: 2 × 3
  id      Data                                                          geometry
  <chr>  <dbl>                                                  <LINESTRING [°]>
1 Rodney  19.0 (-79.41838 36.09853, -78.13063 35.88361, -78.88329 35.67284, -78…
2 Sarah   10.6 (-78.68379 36.6899, -77.57469 37.70191, -79.02128 38.24296, -78.…

Visualizing Line Objects

Just like when we were working with POINT objects, we can use both built-in graphing approaches as well as ggplot() for visualizing. Here I’m goint to plot the two line objects using normal graphics.

plot( lines["Data"], 
      col=c("red","green"),
      lwd = 2)

 

 

Operations on LINESTRING Objects

Physical length of the LINESTRING objects

st_length( lines )
Units: [m]
[1] 449700.9 634051.9

The bounding box around all the lines.

st_bbox( lines )
     xmin      ymin      xmax      ymax 
-79.41838  35.67284 -75.74775  38.85388 

The area of the convex hull encompassing all the lines.

library( units )
lines %>% 
  st_union() %>% 
  st_convex_hull() %>% 
  st_area() %>%
  set_units( km^2 )
74575.83 [km^2]

Textual Versions of geometry

While it is easy to go from text \(\to\) POLYGON, the same thing can be done going in the opposite direction.

lines$geometry[1] %>% st_as_text()
[1] "LINESTRING (-79.41838 36.09853, -78.13063 35.88361, -78.88329 35.67284, -78.48248 36.35113, -76.59887 36.81712)"

Polygons

Polygons are simply lines whose first and last coordinate are exactly the same

df.p <- df[ c(1:5,1, 6:10,6),]
df.p
        id       Data Longitude Latitude
1   Rodney  22.937889 -79.41838 36.09853
2   Rodney   7.707515 -78.13063 35.88361
3   Rodney  51.701901 -76.59887 36.81712
4   Rodney  27.875812 -78.48248 36.35113
5   Rodney -15.005974 -78.88329 35.67284
1.1 Rodney  22.937889 -79.41838 36.09853
6    Sarah  -3.620029 -78.90505 38.85388
7    Sarah  11.084854 -78.68379 36.68990
8    Sarah  45.467802 -79.02128 38.24296
9    Sarah -11.474493 -77.57469 37.70191
10   Sarah  11.441565 -75.74775 38.57646
6.1  Sarah  -3.620029 -78.90505 38.85388

Polygon de novo Creation

df %>%
  st_as_sf( coords = c("Longitude", "Latitude"), crs=4326 ) %>%
  group_by( id ) %>%
  summarize( Data = mean( Data ) ) %>%
  st_cast("POLYGON") -> polygons

 

polygons
Simple feature collection with 2 features and 2 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -79.41838 ymin: 35.67284 xmax: -75.74775 ymax: 38.85388
Geodetic CRS:  WGS 84
# A tibble: 2 × 3
  id      Data                                                          geometry
  <chr>  <dbl>                                                     <POLYGON [°]>
1 Rodney  19.0 ((-79.41838 36.09853, -78.13063 35.88361, -78.88329 35.67284, -7…
2 Sarah   10.6 ((-78.68379 36.6899, -77.57469 37.70191, -79.02128 38.24296, -78…

Visualizing Polygon Objects

ggplot( polygons ) + 
  geom_sf( aes( fill = Data ), alpha=0.75 ) + coord_sf() + 
  theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=1))

Polygons From Data Frames

map_data( "county", "virginia") %>%
  dplyr::select( Longitude = long,
                 Latitude = lat,
                 group,
                 County = subregion) -> va_counties
head( va_counties )
  Longitude Latitude group   County
1 -75.27519 38.03867     1 accomack
2 -75.21790 38.02721     1 accomack
3 -75.21790 38.02721     1 accomack
4 -75.24655 37.99283     1 accomack
5 -75.30384 37.94127     1 accomack
6 -75.31530 37.92981     1 accomack

ggplot( va_counties, aes( Longitude, Latitude) ) + 
  geom_point( size=0.25 ) + 
  coord_quickmap()

va_counties %>%
  filter( County %in%  c("hanover","henrico") ) %>%
  ggplot( aes(Longitude, Latitude) ) + 
  geom_polygon( aes( fill = County), alpha=0.1 ) +
  geom_point( aes( color = County) ) +
  coord_quickmap()

ggplot( va_counties, aes( Longitude, Latitude) ) + 
  geom_polygon( aes( group=group ), fill="grey80",
                color = "black", linewidth = 0.25) + 
  coord_quickmap()

ggplot() + 
  geom_polygon( aes( Longitude, Latitude, group=group ),
                fill="grey80",color = "black", 
                size = 0.25, data=va_counties) +
  geom_sf( aes( color=Data), 
           lwd=3, data=lines) + 
  coord_sf()

Shapefiles

The ESRI Shapefile is a non-standardized format for packaging vector based data (POINT, LINESTRING, POLYGON, etc.).

However, it is not actually a file but it is a collection of files, which may (or may not) expand directly in the folder or within a subfolder.

Archived ZIP Files.

I’ve uploaded some shapefiles to the Github Repository for this class. These are from the Richmond City GIS Department and represent the centerlines of roads in the city as well as the zoning districts.

Here are the URL’s for both of these files.

roads_url <- "https://github.com/dyerlab/ENVS-Lectures/raw/master/data/Centerlines-shp.zip"
district_url <- "https://github.com/dyerlab/ENVS-Lectures/raw/master/data/Zoning_Districts-shp.zip"

Downloading Files Using R

You can use your browser and copy and paste those addresses in to download the files to your computer. Then you’ll have to move those archives to your PROJECT FOLDER (you are using Projects right?).

OR

You can have R download the file and save it locally.

download.file( district_url , destfile = "./Districts.zip")
download.file( roads_url, destfile =  "./Roads.zip")

Unziping Files

You can open your computer finder and have the OS unzip the archives.

OR

You can have R do it.

unzip("Districts.zip")
unzip("Roads.zip")

File Components

The Projection File

Loading A Shapefile

You can load it in using sf::st_read() and passing it the .shp file path to it.

st_read("Zoning_Districts.shp") -> districts 
Reading layer `Zoning_Districts' from data source 
  `/Users/rodneydyer/Documents/Teaching/Shapefiles/Zoning_Districts.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 634 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 11743500 ymin: 3687944 xmax: 11806060 ymax: 3744741
Projected CRS: NAD83 / Virginia South (ftUS)

Shapefiles Convert to sf Objects

By default, the data associated with a vector object are put into a data.frame and the geometry object is properly created.

head( districts, n=2 )
Simple feature collection with 2 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 11773600 ymin: 3730159 xmax: 11789510 ymax: 3731016
Projected CRS: NAD83 / Virginia South (ftUS)
  OBJECTID Name Ordinance OrdinanceP Conditiona AdoptionDa Comment
1        1 RO-2      <NA>       <NA>         No 2000-01-01    <NA>
2        2  B-2      <NA>       <NA>         No 2000-01-01    <NA>
           CreatedBy CreatedDat             EditBy   EditDate
1 richard.morton_cor 2020-08-24 richard.morton_cor 2020-08-24
2 richard.morton_cor 2020-08-24 richard.morton_cor 2020-08-24
                              GlobalID Shape__Are Shape__Len
1 334799f0-fe38-46bf-97c2-260f5a036559   60150.29   983.6815
2 558df9cd-4f9c-4248-a689-bc2d9c79d060   56987.01   971.8832
                        geometry
1 MULTIPOLYGON (((11773598 37...
2 MULTIPOLYGON (((11789222 37...

Simplifying Data

names( districts )
 [1] "OBJECTID"   "Name"       "Ordinance"  "OrdinanceP" "Conditiona"
 [6] "AdoptionDa" "Comment"    "CreatedBy"  "CreatedDat" "EditBy"    
[11] "EditDate"   "GlobalID"   "Shape__Are" "Shape__Len" "geometry"  

Simplifying Data

districts %>% 
  dplyr::select( -Comment, -CreatedBy, -CreatedDat, -EditBy, -EditDate ) %>%
  dplyr::select( -Shape__Are, -Shape__Len) -> districts
head( districts, n=1)
Simple feature collection with 1 feature and 7 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 11773600 ymin: 3730159 xmax: 11773940 ymax: 3730493
Projected CRS: NAD83 / Virginia South (ftUS)
  OBJECTID Name Ordinance OrdinanceP Conditiona AdoptionDa
1        1 RO-2      <NA>       <NA>         No 2000-01-01
                              GlobalID                       geometry
1 334799f0-fe38-46bf-97c2-260f5a036559 MULTIPOLYGON (((11773598 37...

District Codes

districts$Name %>% unique() %>% sort()
 [1] "B-1"     "B-2"     "B-3"     "B-4"     "B-5"     "B-6"     "B-7"    
 [8] "CM"      "DCC"     "I"       "M-1"     "M-2"     "OS"      "R-1"    
[15] "R-2"     "R-3"     "R-4"     "R-43"    "R-48"    "R-5"     "R-53"   
[22] "R-6"     "R-63"    "R-7"     "R-73"    "R-8"     "R-MH"    "RF-1"   
[29] "RF-2"    "RO-1"    "RO-2"    "RO-3"    "RO2-PE2" "RO2-PE4" "RP"     
[36] "TOD-1"   "UB"      "UB-2"    "UB-PE2"  "UB-PE3"  "UB-PE4"  "UB-PE5" 
[43] "UB-PE6"  "UB-PE7"  "UB-PE8"  "UB-PO1"  "UB-PO2"  "UB-PO3"  "UB-PO4" 
[50] "UB2-PE8"

Visualizing

plot( districts["Name"] )

Visualizing Overlays

plot( st_geometry( districts ))
plot( st_geometry( districts[districts$OBJECTID==530,] ), 
      col='red', add=TRUE )

Spatial Joins

In the topic covering [Relational Operations], we used Primary and Foreign Keys to join data.frame objects and combine data. We can do simliar operations using geospatial positions to perform spatial joins.

There are a wide variety of operations available:

Check out the Cheetsheet

Secondary Vector Data Set

road.shapefile <- st_read("Centerlines-shp/tran_Carriageway.shp")
Reading layer `tran_Carriageway' from data source 
  `/Users/rodneydyer/Documents/Teaching/Shapefiles/Centerlines-shp/tran_Carriageway.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 29081 features and 25 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 11734060 ymin: 3682790 xmax: 11817490 ymax: 3751927
Projected CRS: NAD83 / Virginia South (ftUS)
names( road.shapefile )
 [1] "FID"        "Carriagewa" "AssetID"    "StreetType" "Functional"
 [6] "FIPS"       "LeftFromAd" "LeftToAddr" "RightFromA" "RightToAdd"
[11] "PrefixDire" "ProperName" "SuffixType" "SuffixDire" "FullName"  
[16] "RouteName"  "OneWay"     "PostedSpee" "CADRouteSp" "CreatedBy" 
[21] "CreatedDat" "EditBy"     "EditDate"   "GlobalID"   "SHAPE_Leng"
[26] "geometry"  

Check Projections

Dyer’s First Rule: Make Sure Projections Are the Same

st_crs( road.shapefile ) == st_crs( districts )
[1] TRUE

If they differed, one could do something like:

road.shapefile %>%
  st_transform( crs = st_crs( districts ) ) -> road.shapefile

Data Cleanup

road.shapefile %>%
  dplyr::select( FullName, OneWay, StreetType,
                 SpeedLimit = PostedSpee, Length = SHAPE_Leng,
                 geometry) %>%
  mutate( OneWay = factor( OneWay ),
          StreetType = factor( StreetType) ) -> roads
summary( roads )
   FullName          OneWay                    StreetType      SpeedLimit   
 Length:29081       1   :    2   Alley              : 3139   Min.   : 0.00  
 Class :character   FT  : 3225   Artery             : 5201   1st Qu.:25.00  
 Mode  :character   TF  : 3148   Highway            :  384   Median :25.00  
                    NA's:22706   Highway Interchange:   93   Mean   :25.17  
                                 Private            : 1540   3rd Qu.:25.00  
                                 Ramp               :  378   Max.   :55.00  
                                 Secondary          :18346   NA's   :7959   
     Length                   geometry    
 Min.   :    2.943   LINESTRING   :29081  
 1st Qu.:  164.849   epsg:2284    :    0  
 Median :  292.517   +proj=lcc ...:    0  
 Mean   :  379.338                        
 3rd Qu.:  461.146                        
 Max.   :17851.326                        
                                          

Visualizing: Filter Highways & Plot

roads %>%
  filter( StreetType %in% c("Highway", 
                            "Highway Interchange",
                            "Ramp")) -> highways
summary( highways )
   FullName          OneWay                  StreetType    SpeedLimit   
 Length:855         1   :  0   Alley              :  0   Min.   : 0.00  
 Class :character   FT  :416   Artery             :  0   1st Qu.:25.00  
 Mode  :character   TF  :408   Highway            :384   Median :25.00  
                    NA's: 31   Highway Interchange: 93   Mean   :36.19  
                               Private            :  0   3rd Qu.:55.00  
                               Ramp               :378   Max.   :55.00  
                               Secondary          :  0   NA's   :131    
     Length                  geometry  
 Min.   :   14.46   LINESTRING   :855  
 1st Qu.:  276.48   epsg:2284    :  0  
 Median :  726.13   +proj=lcc ...:  0  
 Mean   : 1055.15                      
 3rd Qu.: 1170.35                      
 Max.   :17851.33                      
                                       

Plot It

Let’s color this based upon SpeedLimit

Visualizing A Single Entity

Using Normal Filtering

roads %>% 
  filter( FullName == "Three Chopt Road") %>%
  ggplot() + 
  geom_sf( aes(color=SpeedLimit), 
           lwd=2) + 
  coord_sf() + theme(axis.text.x = element_text(angle = 45) )

Intersection Plotting

Let’s grab a bit of zoning from The Fan

districts %>%
  filter( OBJECTID == 368 ) %>%
  st_buffer(dist = 1500) %>%
  st_bbox() -> fan_bbox

districts %>%
  st_crop( fan_bbox ) -> theFan 

plot( theFan["Name"])

Add Auxillary Data

zone_url <- "https://raw.githubusercontent.com/dyerlab/ENVS-Lectures/master/data/DistrictCodes.csv"

theFan %>%
  left_join( read_csv( zone_url ),
             by="Name") %>%
  mutate( Category = factor( Category) ) %>%
  select( OBJECTID, 
          Name, 
          Category, 
          everything() )  -> theFan

 

ggplot( theFan ) + geom_sf( aes( fill=Category)) + 
  scale_fill_brewer( type="qual", palette = "Set3") + theme_void()

Cropping Roads

roads %>% st_crop( st_bbox( theFan )) -> fanRoads
plot( st_geometry( fanRoads ))

Selecting a Single Zone

Add an attribute to the POLYGON’s indicating if the district is that big one in the middle of the plot. Let’s plot it to see what the ID is.

theFan %>%
  mutate( Target = ifelse( OBJECTID == 368, 
                           TRUE, 
                           FALSE) ) -> theFan

Plot it

theFan %>%
  ggplot() + 
  geom_sf( aes(fill=Target) ) + 
  geom_sf_text( aes(label=OBJECTID), size=3 ) +
  coord_sf() + theme_void()

Isolate the District

target <- theFan[ theFan$OBJECTID == 368, ]
plot( st_geometry( target ) ) 

 

fanRoads %>%
  filter( st_intersects( fanRoads, target, 
                         sparse = FALSE ) == TRUE ) %>%
  as_data_frame() %>% select( `Street Name` = FullName ) %>%
  arrange( `Street Name`) %>% unique() 
# A tibble: 39 × 1
   `Street Name` 
   <chr>         
 1 Allison St    
 2 Birch St      
 3 Boyd St       
 4 Floyd Ave     
 5 Grove Ave     
 6 Hanover Ave   
 7 Kensington Ave
 8 Lombardy Pl   
 9 Madumbie Lane 
10 Monument Ave  
# ℹ 29 more rows

 

fanRoads %>%
  filter( st_intersects( fanRoads, target, 
                         sparse = FALSE ) == TRUE ) %>%
  ggplot() +
  geom_sf( data=target  ) +
  geom_sf( color="darkgreen" ) + theme_void()

Questions

If you have any questions, please feel free to either post them as an “Issue” on your copy of this GitHub Repository, post to the Canvas discussion board for the class, or drop me an email.

Peter Sellers looking bored